Bilingual Terminology Extraction Using Multi-level Termhood
نویسندگان
چکیده
Purpose: Terminology is the set of technical words or expressions used in specific contexts, which denotes the core concept in a formal discipline and is usually applied in the fields of machine translation, information retrieval, information extraction and text categorization, etc. Bilingual terminology extraction plays an important role in the application of bilingual dictionary compilation, bilingual Ontology construction, machine translation and cross-language information retrieval etc. This paper addresses the issues of monolingual terminology extraction and bilingual term alignment based on multi-level termhood. Design/methodology/approach: A method based on multi-level termhood is proposed. The new method computes the termhood of the terminology candidate as well as the sentence that includes the terminology by the comparison of the corpus. Since terminologies and general words usually have differently distribution in the corpus, termhood can also be used to constrain and enhance the performance of term alignment when aligning bilingual terms on the parallel corpus. In this paper, bilingual term alignment based on termhood constraints is presented. Findings: Experiment results show multi-level termhood can get better performance than existing method for terminology extraction. If termhood is used as constrain factor, the performance of bilingual term alignment can be improved. Originality/value: The termhood of the candidate terminology and the sentence that includes the terminology is used to terminology extraction, which is called multi-level termhood. Multi-level termhood is computed by the comparison of the corpus. The experiment results show that the multi-level termhood can get better performance than standard method. Bilingual term alignment method based on termhood constraint is put forward and termhood is used in the task of bilingual terminology extraction. Experiment results show that termhood constraints can improve the performance of terminology alignment to some extent.
منابع مشابه
Termhood-Based Comparability Metrics of Comparable Corpus in Special Domain
Cross-Language Information Retrieval (CLIR) and machine translation (MT) resources, such as dictionaries and parallel corpora, are scarce and hard to come by for special domains. Besides, these resources are just limited to a few languages, such as English, French, and Spanish and so on. So, obtaining comparable corpora automatically for such domains could be an answer to this problem effective...
متن کاملA Study on Terminology Extraction Based on Classified Corpora
Algorithms for automatic term extraction in a specific domain should consider at least two issues, namely Unithood and Termhood(Kageura,1996). Unithood refers to the degree of a string to occur as a word or a phrase. Termhood (Chen Yirong, 2005) refers to the degree of a word or a phrase to occur as a domain specific concept. Unlike unithood, study on termhood is not yet widely reported. In cla...
متن کاملTerminology-driven Augmentation of Bilingual Terminologies
This paper proposes a way of augmenting bilingual terminologies by using a “generate and validate” method. Using existing bilingual terminologies, the method generates “potential” bilingual multi-word term pairs and validates their status by searching web documents to check whether such terms actually exist in each language. Unlike most existing bilingual term extraction methods, which use para...
متن کاملPattern Based Term Extraction Using ACABIT System
In this paper, we proposed pattern based term extraction model for Japanese applying ACABIT system developed for French. Proposed model evaluates termhood using morphological patterns of basic terms and term variants. After extracting term selections, ACABIT system filters non-terms out from the selections based on simple log likely hood evaluation. This approach would be suitable to Japanese t...
متن کاملMutual Bilingual Terminology Extraction
This paper describes a novel methodology to perform bilingual terminology extraction, in which automatic alignment is used to improve the performance of terminology extraction for each language. The strengths of monolingual terminology extraction for each language are exploited to improve the performance of terminology extraction in the other language, thanks to the availability of a sentence-l...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- The Electronic Library
دوره 30 شماره
صفحات -
تاریخ انتشار 2012